Homework 10 : Conditioning and stability of linear least squares


The least squares problem

(1)Ax=b

where ARm×n, mn has four associated "conditioning" problems, described in the table in Theorem 18.1 of TB (page 131). These are

  1. Sensitivity of y=Ax to right hand side vector b,

  2. Sensitivity of the solution x to right hand side vector b,

  3. Sensitivity of y=Ax to the coefficient matrix A, and

  4. Sensitivity of the solution x to the coefficient matrix A.

Problem 1


Sensitivity of y to a perturbation in b.

In TB Lecture 12, the relative condition number is defined as

(2)κ=supδx(δff(x)/δxx)

Problem 1(a)

Arguing directly from this definition, establish the condition number of y with respect to perburbations in b given by TB Lecture 18

(3)κ=1cosθ

Hint: The input "x" in this problem is b and the output (or model) "f" is y. Show geometrically that the supremum is attained with Pδb=δb.

Solution

Since the relative condition number is defined as ; (4)κ=supδx(δff(x)/δxx),

then the relative condition number of y with respect to perburbations in b is given by;

(5)κ=supδb(δyy/δbb)

where y=Ax=Pb is the the projetion of b onto the range(A), δb is a pertubation in b with a correponding pertubation δy in y (which is the projetion of δb onto the range(A)) . All this is illustrated geometrically in the image below;

IMG-2413.jpg

From the above image we notice that cos(θ)=yb where θ is the angle between b and the range(A).

The relative condition number of y with respect to perburbations in b then becomes;

κ=supδb(1cos(θ)×δyδb)=supδb(1cos(θ)×Pδbδb)=1cos(θ)×supδb(Pδbδb)

But to find the supδb(Pδbδb), we must maximize the ratio Pδbδb.

From the diagram above we see that; for 0<β<π/2, δb will always be greater than δy and therefore to maximize Pδbδb, β must be equal to zero and δb=Pδb.

This means that we obtain supδb(Pδbδb) when δb=Pδb and therefore the condition number of y with respect to perburbations in b becomes; (6)κ=1cosθ

Problem 1(b)

For θ=π/2, the condition number is . Illustrate what this means by considering the least squares problem

(7)[21][x]=[12]

Use the results in TB 11.11 and 11.12 (page 82) to determine the projection operator P for this problem. Then compute y=Pb and show that Pb=0. Find a perturbation δb so that Pδb=δb=δy0. Explain what a condition number κ= might mean here. Illustrate your argument graphically.

Solution

From the above system we have that;

A=[21] and b=[12]

To find the the projection operator for this problem, we use P=AA, but A=(ATA)AT=([21][21])1[21]=15[21]P=15[21][21]=[45252515]y=Pb=[45252515][12]=[45+4525+25]Pb=[00]=0

To find δb, will let δb=b+u where b=[12] and u=[xy]such that Pδb=δb that is; P(b+u)=(b+u)15[4221][1+x2+y]=([1+x2+y])[4x+2y2x+y]=([5+5x10+5y]) Equating elements in the two vectors we obtain; (8)x=2y+5(9)4y=102x Subbstituting for x in equation (26), we obtain; (10)4y=102(2y+5)(11)0=0 This means that y can take on any real number ie. y=tR.

Chosing y=t=0, this gives x=5 and u=[50]

δb=[1+52+0]=[42]δy=Pδb=15[4221][42]=[42]=δb

This can be geometrically illustrated as shown in the image below;

IMG-2414.jpg

From the graph above, we see that the projection of b onto the range(A) is a zero vector, that is y=Pb=0. Surprisingly, a small change in b say δb results into non-zero pertubation δy in the projection. This means that the residual r=bAx=b and that the change in solution is very very large.

With reference to the above ideas, we conclude that when θ=π/2, the condition number blows up i.e. κ=.

The same conclusion can be reached by using the definition of the condition number of y with respect to perburbations in b; κ=1cos(θ)=1cos(π/2)=

Problem 1(c)

Now consider the problem

(12)[21][x]=[21]

For this problem, show that κ=1. What is qualitatively different about this problem than the problem in which κ=?

Solution

IMG-2417.jpg

Since in 1(a) we proved that κ=1cos(θ) and the angle between b and the range(A) is zero, the condition numder for this case is then given as; κ=1cos(θ)=1cos(0)=1

For this problem, we see that the projection of b onto the column space of A is b itself, that is y=Pb=b. This means that b is in the range(A), the residual r=bAx=0 and that the change in the solution is very very small which isn't the case when κ=.

Problem 2


Problem 18.1 in TB (page 136)

2(a)
2(b)
2(c)
2(d)
2(e)

(i)

The first condition number Kby is attained when δb=Pδb and this occurs when δb is equal to zero except in the first n=2 entries (n is the number of columns of A),

I therefore used δb as [0.00010.00010] and this gave Kby=1.29108

(ii)

The condition number Kbx is attained when Aδb2=A2δb2 and this occurs when δb is equal to zero except in the 2nd entry.

For this I used δb=[01080] this gave Kby=54775.17695228298

(iii)

The condition number KAy is attained when δA=δpv2T where δp is the vector orthogonal to the range(A) and v2 is the second column of the right singular matrix v.

for this I used δp=(IAA)[1071070], v2T=[0.707130350.70708321] and obtained δA=54775.16415665296

Problem 3


Show that if (λ,v) is an eigenvalue/eigenvector pair for matrix A, then ((λμ)1,v) is an eigenvalue/eigenvector pair for the matrix (AμI)1.

Why is this observation useful when using the power iteration to find an eigenvalue close to μ?

Solution

If λ is an eigenvalue of matrix A with a corresponding eigenvector v, then we have that;

(13)Av=λv

Subtracting μv from equation() above we obtain;

(14)Avμv=λvμv(15)(AμI)v=(λμ)v

Since μR is not an eigenvalue of A, then AμI is invertible and therefore multiplying the above equation with (AμI)1, we get;

(16)(AμI)1(AμI)v=(λμ)(AμI)1v(17)(18)v=(λμ)(AμI)1v(19)(20)(AμI)1v=(λμ)1v

Therefore from the above equation, we can conclude that ((λμ)1,v) is an eigenvalue/eigenvector pair for the matrix (AμI)1.

Why is this observation useful when using the power iteration to find an eigenvalue close to μ?

This observation is useful when using the power iteration to find an eigenvalue (λj) close to μ because the dominant eigenvalue of (AμI)1 will be (λjμ)1 from which we can easily obtain λj.

Problem 4


Exercise 29.1 (Lecture 29, TB page 223). This is a five part problem that asks you to code an eigenvalue solver for a real, symmetric matrix using the shifted QR algorithm. Do your code in Python, using the Numpy qr algorithm where needed.

The basic steps are :

  1. Reduce your matrix A to tridiagonal form. You may use the hessenberg code we wrote in class.

  2. Implement the unshifted QR code (also done in class). Use the Numpy routine qr. Your iteration should stop when the off diagonal elements are smaller (in absolute value) than τ1012.

  3. Find all eigenvalues of a matrix A using the "deflation" idea described in Algorithm 28.2.

  4. Introduce the Wilkinson shift, described in Lecture 29.

Notes

(21)Hij=1i+j1,i,j=1,2,m

Part (e)